Bridging Domains with Words: Opinion Analysis with Matrix Tri-factorizations
نویسندگان
چکیده
With the explosion of user-generated web2.0 content in the form of blogs, wikis and discussion forums, the Internet has rapidly become a massive dynamic repository of public opinion on an unbounded range of topics. A key enabler of opinion extraction and summarization is sentiment classification: the task of automatically identifying whether a given piece of text expresses positive or negative opinion towards a topic of interest. Building high-quality sentiment classifiers using standard text categorization methods is challenging due to the lack of labeled data in a target domain. In this paper, we consider the problem of cross-domain sentiment analysis: can one, for instance, download rated movie reviews from rottentomatoes.com or IMBD discussion forums, learn linguistic expressions and sentiment-laden terms that generally characterize opinionated commentary and then successfully transfer this knowledge to the target domain, thereby building high-quality sentiment models without manual effort? We outline a novel sentiment transfer mechanism based on constrained non-negative matrix tri-factorizations of termdocument matrices in the source and target domains. The constrained matrix factorization framework naturally incorporates document labels via a least squares penalty incurred by a certain linear model and enables direct and explicit knowledge transfer across different domains. We obtain promising empirical results with this approach.
منابع مشابه
Discriminative Transfer Learning on Manifold
Collective matrix factorization has achieved a remarkable success in document classification in the literature of transfer learning. However, the learned latent factors still suffer from the divergence between different domains and thus are usually not discriminative for an appropriate assignment of category labels. Based on these observations, we impose a discriminative regression model over t...
متن کاملExploiting Associations between Word Clusters and Document Classes for Cross-Domain Text Categorization
Cross-domain text categorization targets on adapting the knowledge learnt from a labeled source-domain to an unlabeled target-domain, where the documents from the source and target domains are drawn from different distributions. However, in spite of the different distributions in raw word features, the associations between word clusters (conceptual features) and document classes may remain stab...
متن کاملRiordan group approaches in matrix factorizations
In this paper, we consider an arbitrary binary polynomial sequence {A_n} and then give a lower triangular matrix representation of this sequence. As main result, we obtain a factorization of the innite generalized Pascal matrix in terms of this new matrix, using a Riordan group approach. Further some interesting results and applications are derived.
متن کاملTensor Factorization towards Precision Medicine
Precision medicine initiatives come amid the rapid growth in quantity and variety of biomedical data, which exceeds the capacity of matrix oriented data representations and many current analysis algorithms. Tensor factorizations extend the matrix view to multiple modalities and support dimensionality reduction methods that identify latent groups of data for meaningful summarization of both feat...
متن کاملA Word Vector and Matrix Factorization Based Method for Opinion Lexicon Extraction
Automatic opinion lexicon extraction has attracted lots of attention and many methods have thus been proposed. However, most existing methods depend on dictionaries (e.g., WordNet), which confines their applicability. For instance, the dictionary based methods are unable to find domain dependent opinion words, because the entries in a dictionary are usually domain-independent. There also exist ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010